Skip to content

Conversation

@CrooseGit
Copy link

There is an issue on armv7 where a function wont be inlined due to mismatching target features between caller and callee.
The caller has HasV8Ops and FeatureDotProd and the callee does not, but AFAIK this should not be a problem.
https://godbolt.org/z/f19h3zT66 is an example showing how the call is not inlined on armv7.
The expected asm output would be something like:

.fnstart
	vsdot.s8	q0, q1, d4[0]
	bx	lr
.Lfunc_end0:

Thanks to @Amichaxx we managed to narrow it down and now can resolve this problem by adding ARM::FeatureDotProd, ARM::HasV8Ops to InlineFeaturesAllowed in llvm/lib/Target/ARM/ARMTargetTransformInfo.h, after which the inlining occurs successfully.

Whilst we're at it we have also added some debugging to make it easier to tell why (or why not) a function is being inlined for ARM, and a couple other features that seem to be missing from the list.

This patch was motivated by an issue experienced with rust that was traced back to llvm, and thus was designed to address that.

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Nov 24, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-arm

Author: Croose (CrooseGit)

Changes

There is an issue on armv7 where a function wont be inlined due to mismatching target features between caller and callee.
The caller has HasV8Ops and FeatureDotProd and the callee does not, but AFAIK this should not be a problem.
https://godbolt.org/z/f19h3zT66 is an example showing how the call is not inlined on armv7.
The expected asm output would be something like:

.fnstart
	vsdot.s8	q0, q1, d4[0]
	bx	lr
.Lfunc_end0:

Thanks to @Amichaxx we managed to narrow it down and now can resolve this problem by adding ARM::FeatureDotProd, ARM::HasV8Ops to InlineFeaturesAllowed in llvm/lib/Target/ARM/ARMTargetTransformInfo.h, after which the inlining occurs successfully.

Whilst we're at it we have also added some debugging to make it easier to tell why (or why not) a function is being inlined for ARM, and a couple other features that seem to be missing from the list.

This patch was motivated by an issue experienced with rust that was traced back to llvm, and thus was designed to address that.


Full diff: https://github.com/llvm/llvm-project/pull/169337.diff

2 Files Affected:

  • (modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp (+44)
  • (modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.h (+1)
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index d12b802fe234f..89ebc3e715930 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -102,6 +102,50 @@ bool ARMTTIImpl::areInlineCompatible(const Function *Caller,
   // the callers'.
   bool MatchSubset = ((CallerBits & CalleeBits) & InlineFeaturesAllowed) ==
                      (CalleeBits & InlineFeaturesAllowed);
+
+  LLVM_DEBUG({
+    dbgs() << "=== Inline compatibility debug ===\n";
+    dbgs() << "Caller: " << Caller->getName() << "\n";
+    dbgs() << "Callee: " << Callee->getName() << "\n";
+
+    // Bit diffs
+    FeatureBitset MissingInCaller = CalleeBits & ~CallerBits; // callee-only
+    FeatureBitset ExtraInCaller   = CallerBits & ~CalleeBits; // caller-only
+
+    // Counts
+    dbgs() << "Only-in-caller bit count: " << ExtraInCaller.count() << "\n";
+    dbgs() << "Only-in-callee bit count: " << MissingInCaller.count() << "\n";
+ 
+    dbgs() << "Only-in-caller feature indices [";
+    {
+      bool First = true;
+      for (size_t I = 0, E = ExtraInCaller.size(); I < E; ++I) {
+        if (ExtraInCaller.test(I)) {
+          if (!First) dbgs() << ", ";
+          dbgs() << I;
+          First = false;
+        }
+      }
+    }
+    dbgs() << "]\n";
+
+    dbgs() << "Only-in-callee feature indices [";
+    {
+      bool First = true;
+      for (size_t I = 0, E = MissingInCaller.size(); I < E; ++I) {
+        if (MissingInCaller.test(I)) {
+          if (!First) dbgs() << ", ";
+          dbgs() << I;
+          First = false;
+        }
+      }
+    }
+    dbgs() << "]\n";
+
+    // Indicies map to features as found in llvm-project/(your_build)/lib/Target/ARM/ARMGenSubtargetInfo.inc
+    dbgs() << "MatchExact="  << (MatchExact  ? "true" : "false")
+           << " MatchSubset=" << (MatchSubset ? "true" : "false") << "\n";
+  }); 
   return MatchExact && MatchSubset;
 }
 
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
index 919a6fc9fd0b0..2ecfce0de9f55 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
@@ -70,6 +70,7 @@ class ARMTTIImpl final : public BasicTTIImplBase<ARMTTIImpl> {
   // -thumb-mode in a caller with +thumb-mode, may cause the assembler to
   // fail if the callee uses ARM only instructions, e.g. in inline asm.
   const FeatureBitset InlineFeaturesAllowed = {
+      ARM::FeatureDotProd, ARM::HasV8Ops, ARM::FeatureBF16, ARM::FeatureSB,
       ARM::FeatureVFP2, ARM::FeatureVFP3, ARM::FeatureNEON, ARM::FeatureThumb2,
       ARM::FeatureFP16, ARM::FeatureVFP4, ARM::FeatureFPARMv8,
       ARM::FeatureFullFP16, ARM::FeatureFP16FML, ARM::FeatureHWDivThumb,

@Amichaxx
Copy link
Contributor

Amichaxx commented Nov 24, 2025

Hi @davemgreen, I saw you edited InlineFeaturesAllowed most recently.
I was wondering why this isn't implemented as smaller exclusion list (e.g ModeThumb, FeatureNoARM, ModeSoftFloat, FeatureFP64, FeatureD32 must match), which would potentially mean less maintenance in future. I was wondering if there’s a technical reason for keeping the allow-list?

@adamgemmell
Copy link

The original review requested an allowlist as missed optimisations are preferable to miscompilations

@github-actions
Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions h,cpp -- llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/lib/Target/ARM/ARMTargetTransformInfo.h --diff_from_common_commit

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index 89ebc3e71..f0d378b66 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -110,18 +110,19 @@ bool ARMTTIImpl::areInlineCompatible(const Function *Caller,
 
     // Bit diffs
     FeatureBitset MissingInCaller = CalleeBits & ~CallerBits; // callee-only
-    FeatureBitset ExtraInCaller   = CallerBits & ~CalleeBits; // caller-only
+    FeatureBitset ExtraInCaller = CallerBits & ~CalleeBits;   // caller-only
 
     // Counts
     dbgs() << "Only-in-caller bit count: " << ExtraInCaller.count() << "\n";
     dbgs() << "Only-in-callee bit count: " << MissingInCaller.count() << "\n";
- 
+
     dbgs() << "Only-in-caller feature indices [";
     {
       bool First = true;
       for (size_t I = 0, E = ExtraInCaller.size(); I < E; ++I) {
         if (ExtraInCaller.test(I)) {
-          if (!First) dbgs() << ", ";
+          if (!First)
+            dbgs() << ", ";
           dbgs() << I;
           First = false;
         }
@@ -134,7 +135,8 @@ bool ARMTTIImpl::areInlineCompatible(const Function *Caller,
       bool First = true;
       for (size_t I = 0, E = MissingInCaller.size(); I < E; ++I) {
         if (MissingInCaller.test(I)) {
-          if (!First) dbgs() << ", ";
+          if (!First)
+            dbgs() << ", ";
           dbgs() << I;
           First = false;
         }
@@ -142,10 +144,11 @@ bool ARMTTIImpl::areInlineCompatible(const Function *Caller,
     }
     dbgs() << "]\n";
 
-    // Indicies map to features as found in llvm-project/(your_build)/lib/Target/ARM/ARMGenSubtargetInfo.inc
-    dbgs() << "MatchExact="  << (MatchExact  ? "true" : "false")
+    // Indicies map to features as found in
+    // llvm-project/(your_build)/lib/Target/ARM/ARMGenSubtargetInfo.inc
+    dbgs() << "MatchExact=" << (MatchExact ? "true" : "false")
            << " MatchSubset=" << (MatchSubset ? "true" : "false") << "\n";
-  }); 
+  });
   return MatchExact && MatchSubset;
 }
 
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
index 2ecfce0de..87fee9a1b 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
@@ -70,32 +70,69 @@ class ARMTTIImpl final : public BasicTTIImplBase<ARMTTIImpl> {
   // -thumb-mode in a caller with +thumb-mode, may cause the assembler to
   // fail if the callee uses ARM only instructions, e.g. in inline asm.
   const FeatureBitset InlineFeaturesAllowed = {
-      ARM::FeatureDotProd, ARM::HasV8Ops, ARM::FeatureBF16, ARM::FeatureSB,
-      ARM::FeatureVFP2, ARM::FeatureVFP3, ARM::FeatureNEON, ARM::FeatureThumb2,
-      ARM::FeatureFP16, ARM::FeatureVFP4, ARM::FeatureFPARMv8,
-      ARM::FeatureFullFP16, ARM::FeatureFP16FML, ARM::FeatureHWDivThumb,
-      ARM::FeatureHWDivARM, ARM::FeatureDB, ARM::FeatureV7Clrex,
-      ARM::FeatureAcquireRelease, ARM::FeatureSlowFPBrcc,
-      ARM::FeaturePerfMon, ARM::FeatureTrustZone, ARM::Feature8MSecExt,
-      ARM::FeatureCrypto, ARM::FeatureCRC, ARM::FeatureRAS,
-      ARM::FeatureFPAO, ARM::FeatureFuseAES, ARM::FeatureZCZeroing,
-      ARM::FeatureProfUnpredicate, ARM::FeatureSlowVGETLNi32,
-      ARM::FeatureSlowVDUP32, ARM::FeaturePreferVMOVSR,
-      ARM::FeaturePrefISHSTBarrier, ARM::FeatureMuxedUnits,
-      ARM::FeatureSlowOddRegister, ARM::FeatureSlowLoadDSubreg,
-      ARM::FeatureDontWidenVMOVS, ARM::FeatureExpandMLx,
-      ARM::FeatureHasVMLxHazards, ARM::FeatureNEONForFPMovs,
-      ARM::FeatureNEONForFP, ARM::FeatureCheckVLDnAlign,
-      ARM::FeatureHasSlowFPVMLx, ARM::FeatureHasSlowFPVFMx,
-      ARM::FeatureVMLxForwarding, ARM::FeaturePref32BitThumb,
-      ARM::FeatureAvoidPartialCPSR, ARM::FeatureCheapPredicableCPSR,
-      ARM::FeatureAvoidMOVsShOp, ARM::FeatureHasRetAddrStack,
-      ARM::FeatureHasNoBranchPredictor, ARM::FeatureDSP, ARM::FeatureMP,
-      ARM::FeatureVirtualization, ARM::FeatureMClass, ARM::FeatureRClass,
-      ARM::FeatureAClass, ARM::FeatureStrictAlign, ARM::FeatureLongCalls,
-      ARM::FeatureExecuteOnly, ARM::FeatureReserveR9, ARM::FeatureNoMovt,
-      ARM::FeatureNoNegativeImmediates
-  };
+      ARM::FeatureDotProd,
+      ARM::HasV8Ops,
+      ARM::FeatureBF16,
+      ARM::FeatureSB,
+      ARM::FeatureVFP2,
+      ARM::FeatureVFP3,
+      ARM::FeatureNEON,
+      ARM::FeatureThumb2,
+      ARM::FeatureFP16,
+      ARM::FeatureVFP4,
+      ARM::FeatureFPARMv8,
+      ARM::FeatureFullFP16,
+      ARM::FeatureFP16FML,
+      ARM::FeatureHWDivThumb,
+      ARM::FeatureHWDivARM,
+      ARM::FeatureDB,
+      ARM::FeatureV7Clrex,
+      ARM::FeatureAcquireRelease,
+      ARM::FeatureSlowFPBrcc,
+      ARM::FeaturePerfMon,
+      ARM::FeatureTrustZone,
+      ARM::Feature8MSecExt,
+      ARM::FeatureCrypto,
+      ARM::FeatureCRC,
+      ARM::FeatureRAS,
+      ARM::FeatureFPAO,
+      ARM::FeatureFuseAES,
+      ARM::FeatureZCZeroing,
+      ARM::FeatureProfUnpredicate,
+      ARM::FeatureSlowVGETLNi32,
+      ARM::FeatureSlowVDUP32,
+      ARM::FeaturePreferVMOVSR,
+      ARM::FeaturePrefISHSTBarrier,
+      ARM::FeatureMuxedUnits,
+      ARM::FeatureSlowOddRegister,
+      ARM::FeatureSlowLoadDSubreg,
+      ARM::FeatureDontWidenVMOVS,
+      ARM::FeatureExpandMLx,
+      ARM::FeatureHasVMLxHazards,
+      ARM::FeatureNEONForFPMovs,
+      ARM::FeatureNEONForFP,
+      ARM::FeatureCheckVLDnAlign,
+      ARM::FeatureHasSlowFPVMLx,
+      ARM::FeatureHasSlowFPVFMx,
+      ARM::FeatureVMLxForwarding,
+      ARM::FeaturePref32BitThumb,
+      ARM::FeatureAvoidPartialCPSR,
+      ARM::FeatureCheapPredicableCPSR,
+      ARM::FeatureAvoidMOVsShOp,
+      ARM::FeatureHasRetAddrStack,
+      ARM::FeatureHasNoBranchPredictor,
+      ARM::FeatureDSP,
+      ARM::FeatureMP,
+      ARM::FeatureVirtualization,
+      ARM::FeatureMClass,
+      ARM::FeatureRClass,
+      ARM::FeatureAClass,
+      ARM::FeatureStrictAlign,
+      ARM::FeatureLongCalls,
+      ARM::FeatureExecuteOnly,
+      ARM::FeatureReserveR9,
+      ARM::FeatureNoMovt,
+      ARM::FeatureNoNegativeImmediates};
 
   const ARMSubtarget *getST() const { return ST; }
   const ARMTargetLowering *getTLI() const { return TLI; }

Fixes issue where functions are not inlined when caller has these
features, but callee does not.
This makes it easier to see why your function isn't getting inlined for.
@Amichaxx
Copy link
Contributor

@adamgemmell thanks, missed that. I suppose missing features will be added as they are spotted then (provided they are allowed to differ).

@CrooseGit CrooseGit force-pushed the dev/reucru01/armv7-inlining-fix branch from 45a393b to c053e07 Compare November 25, 2025 10:24
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi - What makes a feature invalid for inlining? If it disables the use of some instructions/registers? It would seem that many more features (including FeatureFP64 and FeatureD32) could be added to the list. Why is HasV8Ops there if HasV8_1aOps isn't? Why is FeatureDotProd, but not FeatureAES?

Can you add a test? Both for inlining a function with dotprod and for inlining to a function with dotprod.

Removes FeatureD32 and FeatureFP64 from black list in comments as:
- In https://reviews.llvm.org/D34697#805590 D16 and VFPOnlySP were added to this allowlist because they do "the opposite of what you would expect.
- https://github.com/llvm/llvm-project/commit/760df47b778a530e9368a4b8706940ba103d57ba#diff-8165208908f69b3582d556451[…]6c4b474f2bf32c4ac7fec031cf2efd replaces the previous features with the inverse, but incorrectly keeps them in the allow list as the original reasoning no longer applies.

Some subtarget features provide different instructions depending on whether they are set or unset, these features are believed safe as *not* having these features present does not add instructions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants